Public transport networks: empirical analysis and modeling 
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We use complex network concepts to analyze statistical properties of urban public transport net- 
works (PTN). To this end, we present a comprehensive survey of the statistical properties of PTNs 
based on the data of fourteen cities of so far unexplored network size. Especially helpful in our 
analysis are different network representations. Within a comprehensive approach we calculate PTN 
characteristics in all of these representations and perform a comparative analysis. The standard 
network characteristics obtained in this way often correspond to features that are of practical im- 
portance to a passenger using public traffic in a given city. Specific features are addressed that are 
unique to PTNs and networks with similar transport functions (such as networks of neurons, cables, 
pipes, vessels embedded in 2D or 3D space). Based on the empirical survey, we propose a model 
that albeit being simple enough is capable of reproducing many of the identified PTN properties. A 
central ingredient of this model is a growth dynamics in terms of routes represented by self-avoiding 
walks. 



PACS numbers: 02.50.-r, 07.05.Rm, 89.75.Hc 



I. INTRODUCTION 

The general interest in networks of man-made and nat- 
ural systems has lead to a careful analysis of various net- 
work instances using empirical, simulational, and theo- 
retical tools. The emergence of this field is sometimes 
referred to as the birth of network science [H [2 |3l HI E] . 
In this paper, we use complex network concepts to an- 
alyze the statistical properties of public transport net- 
works (PTN) of large cities. These constitute an example 
of transportation networks [3^ and share general features 
of these systems: evolutionary dynamics, optimization, 
embedding in two dimensional (2D) space. Other exam- 
ples of transportation networks are given by the airport 
f6^, ^[HllSlIIOlIIIlIiailS], railway [14 , or power grid 
networks [llIISl [IS] . 

While the evolution of a PTN of a given city is closely 
related to the city growth itself and therefore is influenced 
by numerous factors of geographical, historical, and so- 
cial origin, there is ample evidence that PTNs of different 
cities share common statistical properties that arise due 
to their functional purposes [17, 18, 19, 20, 2^ [22l [23l 
El EH EH EH EH |29]. Some of these properties have 
been analyzed in former studies, however, the objective 
of the present study is to present a comprehensive survey 
of characteristics of PTNs and to provide a comparative 
analysis. Based on this empirical survey we are in the 
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FIG. 1: One of the networks we analyze in this study. The 
Los Angeles PTN consists of = 1881 routes and N = 44629 
stations, some of them are shown in this map (color online). 



position to propose a growth model that captures many 
of the main (statistical) features of PTNs. 

A further distinct feature of our study is that the 
PTNs we will consider are networks of all means of pub- 
lic transport of a city (buses, trams, subway, etc.) re- 
gardless of the specific means of transport. A number 
of studies have analyzed specific sub-networks of PTNs 
[n [H [H EQl E3 E2 EZ]. Examples are the Boston 
[HI UHl [H EQ] and Vienna [20] subway networks and the 
bus networks of several cities in China [24[E2]- However, 
each particular traffic system (e.g. the network of buses 
or trams, or the subway network) is not a closed system: 
it is a subgraph of a wider transportation system of a 
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city, or as we call it here, of a PTN. Therefore to under- 
stand and describe the properties of public transport in 
a city as a whole, one should analyze the complete net- 
work, without restriction to specific parts. Indeed, for 
the case of Boston it has been shown that changing from 
the subway system to the network "subway + bus" the 
network properties change drastically [181 IE] • 

Urban public transport networks of general type have 
so far been analyzed mainly in two previous studies 
mi |2?. In the first one, Ref. [21 , the PTNs of Berlin, 
Diisseldorf, and Paris were examined, whereas the sub- 
ject of Ref. [22 were public transport systems of 22 
Polish cities. Ref. [21] concentrated on the scale-free 
properties. For the cities considered, the node degree 
distribution was shown to follow a power law. Moreover 
power laws were found for a number of other specific fea- 
tures describing the traffic load on the PTN. However, 
the statistics in this study was too small for definite con- 
clusions. In Ref. [22 it was found that the node degree 
distribution may follow a power law or be described by an 
exponential function, depending on the assumed network 
representation. Besides, a number of other network char- 
acteristics (clustering, betweenness, assortativity) were 
extensively analyzed. 

In the present paper, we analyze PTNs of a number 
of major cities of the world (see table [l|) [30l |31]. Our 
choice for this data base was motivated by the require- 
ment to collect network samples from cities of different 
geographical, cultural, and economical background. Our 
current analysis extends former studies [21] by con- 
sidering cities with larger public transport systems (the 
typical number of stops in the systems considered in Ref. 
[22 was several hundreds) as well as by systematically 
analyzing different representations. The idea of different 
network representations naturally arises in the network 
science [H |21|3l HI [5] . For the PTN the primary network 
topology is given by the set of routes each servicing an or- 
dered series of given stations (see Fig. [l]as an example). 
For the transportation networks studied so far mainly 
two different neighborhood relations were used. In the 
first one, two stations are defined as neighbors only if one 
station is the successor of the other in the series serviced 
by this route [18[ [19]. In the second one, two stations 
are neighbors whenever they are serviced by a common 
route [14]. We will exploit both representations in our 
study. Moreover, we introduce further natural represen- 
tations (described in detail in Section [ll| which make the 
description of the PTNs of t able |l| comprehensive. In par- 
ticular, this includes a bipartite graph representation of a 
transportation network that refiects its intrinsic features 

There is another reason to seek scale-free properties of 
PTNs considering a larger data base of more cities with 
larger public transport communications involved. A cur- 
rently well accepted mechanism to explain the abundant 
occurrence of power laws is that of preferential attach- 
ment or "rich gets richer" [321331134]. As far as PTNs ob- 
viously are evolving networks, their evolution may be ex- 
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3.7 


2992 


211 


29.4 


BSTU 


Dallas 


887 


1.2 


5366 
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59.9 
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34.2 
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44629 


1881 


52.9 
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22.2 
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38.2 
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3961 
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26.8 
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1523 


10.9 


7215 


997 


58.3 
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Sydney 


1687 


3.6 


1978 


596 


16.3 


B 


Taipei 


2457 


6.8 


5311 


389 


70.5 


B 



TABLE I: Cities analyzed in this study. A: urban area (km^); 
P: population (million inhabitants); N: number of PTN sta- 
tions; R: number of PTN routes; S: mean route length. Type 
of transport which is taken into account in the PTN database: 
Bus, Electric trolleybus, Ferry, Subway, Tram, Urban train. 



pected to follow a similar underlying mechanisms. How- 
ever, scale-free networks have also been shown to arise 
when minimizing both the effort for communication and 
the cost for maintaining connections [35l|36]. Moreover, 
this kind of an optimization was shown to lead to small 
world properties [37 and to explain the appearance of 
power laws in a general context [38 . Therefore, scale- 
free behavior of PTNs may also be related to obvious 
objectives to optimize their operation. 



This paper is organized as follows. In the next sec- 
tion (|llj) we define different representations in which the 
PTN will be analyzed, sections HljlV explore the net- 
work properties in these representations. We separately 
analyze local characteristics, such as node degrees and 
clustering coefficients (section HI), and global character- 
istics, such as path length distributions and centralities 
(section IV). Special attention is paid to characteristics 
that are unique to PTNs and networks with similar con- 
struction principles. An example is given by the analysis 
of sequences of routes which go in parallel along a given 
sequence of stations, a feature we call 'harness' effect. 
A description of correlations between the properties of 
neighboring nodes in terms of generalized assortativities 
is performed in section |V| Our findings for the statis- 
tics of real-word PTNs are supported by simulations of 
an evolutionary model of PTNs as displayed in section 



\\n\ Conclusions and an outlook are given in section VII 
Some of our results have been preliminary announced in 
Ref. [25]. 
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FIG. 2: a: a piece of public transport map. Stations A-F are 
serviced by the tram lines No 1 (solid line) , No 2 (dashed line) , 
No 3 (dotted line). Taking all the lines to be indistinguishable, 
we call such representation IL^-space. b: IL-space. c: P'-space. 
All stations that belong to the same route are connected, d: 
P-space. e: C^-space. Each route is represented by a node, 
each link corresponds to a common station shared by the route 
nodes it connects, f: C-space. 



II. PT NETWORK TOPOLOGY 

Although everyone has an intuitive idea about what a 
PTN is, it appears that there are numerous ways to de- 
fine its topology. Let us describe some of them, defining 
different 'spaces' in which public transport networks will 
be analyzed. A straightforward representation of a PT 
map in the form of a graph represents every station by a 
node while any two nodes that are successively serviced 
by at least one route are linked by an edge as shown in 
Fig. [2^. Let us note, that the full information about the 
network of N stations and R routes is given by the set of 
ordered lists each corresponding to one route or to one 
of the two directions of a given route. These simply list 
all stations serviced by that route in the order of service 
between two terminal stations or in the course of a round 
trip. Note that multiple entries of a given station in such 
a list are possible and do occur. Let us first introduce a 
simple graph to represent this situation. In the following 
we will refer to this graph as a L-space [22]. This graph 
represents each station by a node, a link between nodes 
indicates that there is at least one route that services the 
corresponding station consecutively. No multiple links 
are allowed (see Fig. ^p)- The neighbors of a given node 



in IL-space represent all stations that are within reach of 
a single station trip. For analyzing PTNs, the IL-space 
representation has been used in Refs. [181 Ell l22l ESI El] • 
Extending the notion of IL-space one may either introduce 
multiple links between nodes depending on the number 
of services between them or associate a corresponding 
weight to a single link. We will refer to such a represen- 
tation as IL'-space (c.f. Fig. |2^). 

A particularly useful concept for the description of con- 
nectivity in transport networks which we refer to as P- 
space [22] was introduced in ref. [14] and used in PTN 
analysis in Refs. [20l EH [27]. In this representation 
the network is a graph where stations are represented 
by nodes that are linked if they are serviced by at least 
one common route. In P-space representation the neigh- 
borhood of a given node represents all stations that can 
be reached without changing means of transport. The 
P-space concept may be extended to include multiple or 
weighted links. Such a representation we refer to as P'- 
space (c.f. Figs. ^ and|2]d, correspondingly). 

A somewhat different concept is that of a bipartite 
space which is useful in the analysis of cooperation net- 
works [31 ^^i^ representation which we call IB- 
space both routes and stations are represented by nodes 
[241 ESI ES] • Each route node is linked to all station nodes 
that it services. No direct links between nodes of same 
type occur (see Fig. [3| . Obviously, in B-space the neigh- 
bors of a given route node are all stations that it services 
while the neighbors of a given station node are all routes 
that service it. 




FIG. 3: A bipartite graph of tram lines (filled circles) and sta- 
tions (circles) which corresponds to the public transport map 
of Fig. [2^. For the sake of illustration, lines corresponding to 
different tram routes are shown in a differing way. However, 
neither line type nor the order of the stations matter in this 
graph. Note that Figs. [2]:^ - [2]F are the one-mode projections 
of this bipartite graph. 

We note that the one mode projections of the bipartite 
graph of B-space to the set of station nodes results in P- 
space or in P'-space space if we retain multiple links. 
The complementary projection to route nodes leads to a 
graph which we call C-space (C'-space if multiple links 
are retained). In this space all nodes represent routes 
and the neighbors of any route node are those routes 
with which it shares a common station, see Figs. [2^, |2]F. 

Below, we will study different features of the PT net- 
works as they appear when represented in the above de- 
fined spaces. It is worthwhile to mention here, that stan- 
dard network characteristics being represented in differ- 
ent spaces turn out to be natural characteristics one is 
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TABLE H: PTN characteristics in different spaces (subscripts refer to L, P, and C-spaces, correspondingly), k: node degree 
(nearest neighbors number z^-^^); z = {z^'^^) / (z^-^^) (z^'^^ being the next nearest neighbors number); £^^^^ (^): maximal and 
mean shortest path length ( |lO| ; C^: betweenness centrality (23); c: relation of the mean clustering coefficient to that of the 
classical random graph of equal size (|8|. Averaging has been performed with respect to corresponding network, only the mean 
shortest path {£) is calculated with respect to the largest connected component. 



interested in when judging about the public transport of 
a given city. To give an example, the average length of 
a shortest path {£) in IL-space, (^jl) gives the number of 
stops one has to pass on average to travel between any 
two stations. When represented in P space, {£^) tells 
about how many changes one has to do to travel between 
any two stations. And, finally, (ic) brings about the 
number of changes one has to do to pass between any 
two routes. Another example is given by the node de- 
gree k: kj,f tells to how many directions a passenger can 
travel at a given station; ki, is the number of stops in the 
direct neighborhood; k^ is the number of other stations 
reachable without changing a line; whereas kc tells how 
many routes are directly accessible from the given one. 

Table M lists some of the PTN characteristics we ob- 
tained for the cities under consideration using publicly 
available data from the web pages of local transport or- 
ganizations [30l [31] . A detailed analysis and discussion 
is given in the following sections |III| - [Vl 

III. LOCAL NETWORK CHARACTERISTICS 

Let us first examine local properties of the PTNs under 
discussion. Instead of looking for characteristics of indi- 
vidual nodes we will be interested in their mean values 
and statistical distributions. This approach allows us to 
derive conclusions that are significant for the global be- 
havior of the given network. The simplest but highly im- 
portant properties are those concerning the node degrees 
of a network and in particular their distribution. Early 
attempts to model complex networks were performed by 
mathematicians using the concept of random networks 



[40l [41] in which correlations are absent. A wealth of 
insight was gained by elaborating the theory on rigorous 
grounds developing many concepts which remain among 
the core of network analysis. A random graph is given 
by a set of N nodes and M links. The nodes to which 
the two ends of each link are connected are chosen with 
constant probability 2M/N. In case that multiple links 
are excluded the average number of neighbors zi is equal 
to the average node degree k which is: 

{zr) = {k) = 2M/N. (1) 

For the node degree k and its moments k^ the average 
([T]) can also be considered as an average with respect to 
the node degree distribution p{k): 

(fc™) = (2) 

k=l 

with the obvious notation k^^^ for the maximal node 
degree. In ([2]), (...) stands for an ensemble average over 
different network configurations. In the following analyz- 
ing empirical data we will often use the same notation for 
an average over a large network instance. For classical 
random graphs of finite size the node degree distribution 
p{k) is binomial, in the infinite case it becomes a Poisson 
distribution. 

In Figs. [4j |5] we show node degree distributions for 
PTNs of several cities in IL, P, and C-spaces. Note that 
to get smoother curves we plot in the case of P and C- 
spaces the cumulative distributions defined as: 

^max 

Pik)=J2p{q). (3) 

q=k 
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FIG. 4: a: Node degree distributions of PTN of several cities in IL-space. b: Cumulative node degree distribution in P-space. 
c: Cumulative node degree distribution in C-space. Berlin (circles, kj, — 1.24, kw — 39.7), Diisseldorf (squares, kj, — 1.43, 
A;p = 58.8), Hong Kong (stars, kj, = 2.50, kj> = 125.1). 






FIG. 5: a: Node degree distributions of PTN of several cities in IL-space. b: Cumulative node degree distributions in P-space. 
c: Cumulative node degree distribution in C-space. London (circles, 71, = 4.48, 7p = 4.39), Los Angeles (stars, 7^ = 4.85, 
7P = 3.92), Paris (squares, 7L = 2.62, 7p = 3.70). 



In Fig. [4] the data is shown in a log-linear plot together 
with fits (for IL and P-spaces) to an exponential decay: 



p{k) ~ exp(— /c//c), 



(4) 



where k is of the order of the mean node degree. Within 
the accuracy of the data both IL and P-space distribu- 
tions for the cities analyzed in Fig. |4] are nicely fitted 
by an exponential decay. As far as the P-space data is 
concerned, we find evidence for an exponential decay for 
about half of the cities analyzed, while the other part 
rather demonstrate a power law decay of the form: 



p(k) - l/k^ 



(5) 



Figs. |5^, [5)3 show the corresponding plots for three 
other cities on a log- log scale. Numerical values of the 
fit parameters k and 7 Q, ([s]) for different cities are 



given in Table III There, bracketed values indicate a less 
reliable fit. Note that for P-space the fit was done directly 
for the node degree distribution p(/c), whereas due to 
an essential scattering of data in P-space the cumulative 



distribution ([3| was fitted and the corresponding values 
for the fit parameters 7p, k^ were extracted from those 
for the cumulative distributions. 

While the node degree distribution of almost half of 
the cities in the P-space representation display a power 
law decay ([5|, this is not the case for the P-space. So 
far, the analysis of PTNs of smaller cities never showed 
any power-law behavior in P-space [221 El] • The data for 
the three cities shown in Fig. gives first evidence of 
power law behavior of P(k) in the P-space representa- 
tion. Previous results concerning node-degree distribu- 
tions of PTNs in P and P-spaces seemed to indicate that 
in general the degree distribution is power-law like in P- 
space and exponential in P-space. This was interpreted 
[22j as indicating strong correlations in P-space and ran- 
dom connections between the routes explaining P-space 
behavior. Our present study, which includes a much less 
homogeneous selection of cities (Ref. [22^ was based on 
exclusively Polish cities) shows that almost any combi- 
nation of different distributions in P and P-spaces may 
occur. However, the three cities that show a power law 
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TABLE HL Parameters of the PTN node degree distributions 
fit to an exponential (|4| and power law ([5| behavior. Brack- 
eted values indicate less reliable fits. Subscripts refer to IL 
and P-spaces [31j. 

distribution in P-space also exhibit power law behavior 
in P-space, as one can see comparing Figs. [5^ and [5)3. 

In C-space the decay of the node degree distribution is 
exponential or faster, as one can see from the plots in Fig. 
^ andjsj). From the cities presented there, only the PTNs 
of Berlin, London, and Los Angeles are governed by an 
exponential decay and their node degree distributions can 
be approximated by a straight line in the figures. 

For most cities that show a power law degree distribu- 
tion in P-space the corresponding exponent 71L is 7]l ~ 4. 
Note that the exponents found for the PTNs of Polish 
cities of similar size N also lie in this region: 7]l = 3.77 
for Krakow (with number of stations = 940), 7^ = 3.9 
for Lodz {N = 1023), 71L = 3.44 for Warsaw {N = 1530) 
[22] . According to the general classification of scale- free 
networks this indicates that in many respect these net- 
works are expected to behave similar to those with expo- 
nential node degree distribution. Prominent exceptions 
to this rule are provided by the PTNs of Paris {ji, = 2.62) 
and Sao-Paolo (71, = 2.72). Furthermore, values of 71, in 
the range 2.5 ^ 3.0 were recently reported for the bus 
networks of three cities in China: Beijing {N = 3938), 
Shanghai (TV = 2063), and Nanjing {N = 1150) [27 . 

The connectivity within the closest neighborhood of 
a given node is described by the clustering coefficient 
defined as 



where yi is the number of links between the ki nearest 
neighbors of the node i. The clustering coefficient of a 
node may also be defined as the probability of any two 
of its randomly chosen neighbors to be connected. For 
the mean value of the clustering coefficient of a random 



FIG. 6: Mean clustering coefficient (Cp(/c)) of several PTN in 
P-space. Berlin (stars), London (triangles), Taipei (circles). 

graph one finds 

In Table [ll] we give the values of the mean clustering 
coefficient in P, P, and C-spaces. The highest absolute 
values of the clustering coefficient are found in P-space, 
where their range is given by (Cp) = 0.7 ^ 0.9 (c.f. with 
= 0.02 ^ 0.1). This is due to the fact that in this 
space each route gives rise to a fully connected subgraph 
(complete graph). In order to make numbers comparable 
we normalize the value of (C) by the mean clustering 
coefficient ([7|) of a random graph of the same size: 

c = N^{C)/{2M). (8) 

In P and P-representations we find the mean clustering 
coefficient to be larger by orders of magnitude relative 
to the random graph. This difference is less pronounced 
in C-space indicating a lower degree of organization in 
these networks. Furthermore, we find these values to 
vary strongly within the sample of the 14 cities. This 
suggests that the concepts according to which various 
PTNs are structured lead to a measurable difference in 
their organization. 

In P-space the clustering coefficient of a node is 
strongly correlated with the node degree. In Fig. [6] we 
show the mean clustering coefficient of nodes of degree 
/c, (Cp(/c)), as a function of k for several PTNs. Its be- 
havior can be understood as follows. Recall that the P- 
space the degree of a node (station) equals the number 
of stations that can be reached from a given one. Each 
route enters the network as a complete graph, within 
which every node has a clustering coefficient of one. A 
small number k of neighbors of a given station indicates 
that the station belongs to a single route (i.e. (Cp(/c)) 



7 




is most probably equal to one). For nodes with higher 
degrees k it is more probable that they belong to more 
than one route. Consequently, (Cp(/c)) decreases with k. 
The change in the behavior of (Cp(/c)) should occur at 
some value of k which is of the order of the mean num- 
ber of stops of the routes. The prominent feature of the 
function (Cp(/c)) in P-space is that it decays following a 
power law 



the average (10) reads 



(Cp(fc)) ^ k 



-f3 



(9) 



Within a simple model of networks with star-like topol- 
ogy this exponent is found to be of value /3 = 1 [22 . In 
transport networks. This behavior was first observed for 
the Indian railway network [14] and then for the Polish 
PTNs [2T. In our case, the values of the exponent P 
for the networks studied lie in the range from 0.65 (Sao 
Paolo) to 0.96 (Los Angeles) with a mean value of 0.82. 



IV. GLOBAL CHARACTERISTICS 
A. Path length distribution 

Let iij be the length of a shortest path between sites i 
and j in a given space. The mean shortest path is defined 
as 



N 



N{N - 1) 



i>j=l 



(10) 



Note that {£) is well-defined only if nodes i and j be- 
long to the same connected component of the network. 
In the following any expression as given in Eq. (10) will 



be restricted to this case. Furthermore, related network 
characteristics will be calculated for the largest (or gi- 
ant) connected component, GCC. Correspondingly, N 
denotes the number of constituting nodes of this com- 
ponent. Denoting the path length distribution as P(^), 



(11) 



where £^^^ is maximal shortest path length found on 
the connected component. In Fig. [7|we plot the mean 
shortest path length distributions obtained in different 
spaces for several selected cities. Together with the data 
we plot a fit to the asymmetric unimodal distribution 



P{£) = A£exp{-Bf ^Ci), 



(12) 



where A, 5, C are fit parameters. As can be seen from 
the figures, the data is generally nicely reproduced by 
this ansatz. However, in certain networks additional fea- 
tures may lead to a deviation from this behavior as can 
be seen from Fig. [Sj which shows the mean shortest 
path length distribution in E-space P]l(^) for Los Ange- 
les. One observes a second local maximum on the right 
shoulder of the distribution. Qualitatively this behavior 
may be explained by assuming that the PTN consists of 
more than one community. For the simple case of one 
large community and a second smaller one at some dis- 
tance this situation will result in short intra-community 
paths which will give rise to a global maximum and a 
set of longer paths that connect the larger to the smaller 
community resulting in additional local maxima. Such a 
situation definitely appears to be present in the case of 
the Los Angeles PTN, see Fig. [l] 

Let us introduce a characteristic that informs how re- 
mote a given node is from the other nodes of the net- 
works. For the node i this may be characterized by the 
value: 



TV 



(13) 



Now, the mean shortest path ( 10 ) can be defined in terms 
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FIG. 8: Mean shortest path length distribution in L-space, 
P]l(^), for the PTN of Los Angeles. 



of £i as: 



(14) 



In order to look for correlations between ii and the node 
degree ki let us introduce the value: 



m 



1 



N 



k,ki 1 



(15) 



where Nj^ is number of nodes of degree k and 5k^ki is the 
Kronecker delta. Consequently, l{k) is the mean short- 
est path length between any node of degree k and other 
nodes of the network. For the majority of the analyzed 
cities the dependence of the mean path ij,{k) ([l5| on 
the node degree k in IL-space can be approximated by a 
power law 



(16) 



The value of the exponent varies in the range aj, = 0.17^ 
0.27. We show this dependence for several cities in Fig. 

A particular relation between path lengths and node 
degrees can be shown to hold relating the mean path 
length between two nodes to the product of their node 
degrees. To this end let us define 



N 



(17) 



As has been shown in |42, this relation can be approxi- 
mated by 



i{k,q) = A - B\og{kq). 



(18) 



FIG. 9: Mean path i-L{k) (15) in the IL-space as a function 
of the node degree k with a fit to the power law decay (16). 
a: Berlin, aj, = 0.23; b: Hong Kong, aj, — 0.25; c: Paris, 
aj, — 0.15; d: Taipei, aj, — 0.23. 



For random networks the coefficients A and B can be 
calculated exactly [43 . 



The validity of Eq. (18) was 
checked on the base of PTNs of some Polish cities and 
a rather good agreement for the majority of the cities 
was found in IL-space . In our analysis which concerns 
PTNs of much larger size, we do not observe the same 
good agreement for all cities. The suggested logarithmic 



dependence (18) was found by us in IL-space also for the 
larger cities, however with much more pronounced scatter 
of data for large values of the product kq. In Fig. [l0]we 
plot the mean path ii,{k^q) in the IL-space for the PTN 
of Berlin, Hong Kong, Rome, and Taipei. Note, however, 
that due to the scatter of data a logarithmic dependence 
frequently is indistinguishable from a power law with a 
small exponent. 

In P-space, the shortest path length £ij gives the min- 
imal number of routes required to be used in order to 
reach site j starting from the site i. In turn, Eq. (13), 



defines the number of routes one uses on average travel- 
ing from the site i to any node of the network. The higher 
the node degree, the easier it is to access other routes in 
the network. Therefore, also in P-space one expects a 
decrease of i^{k) when k increases. This is shown for 
several cities in Fig. [iT] Besides the expected decrease 



of ^p(/c), one can see a tendency to a power-law decay 

^p(fc)-/c-"^ (19) 

The value of the exponent ap varies in the interval 
Qfp = 0.09 (for Sydney) to ap = 0.17 (for Dallas) and 
is centered around ap = 0.12 ^0.13 as shown for the 
cities in Fig. [TT] The mean path ^p(/c, q) as a function of 
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FIG. 12: Mean path £p{k, q) in P-space for PTN of Berlin (a) 
and Paris (b) as a function of kq. 



FIG. 10: Mean path i-L{k, q) (17) in the L-space as a function 



of kq for the PTN of Berhn (stars), Hong Kong (circles), Rome 
(triangles), and Taipei (squares). 
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FIG. 11: Mean path ^p(A:) in P-space as a function of the node 
degree k and its fit to the power law decay (19). a: Berlin, 



ap = 0.13; b: Hong Kong, ap = 0.12; c: Paris, ap = 0.13; d: 
Taipei, ap = 0.12. 



kq for several cities is given in P-space in Fig. 12 The 
scattering of data is much more pronounced than in L- 
space. However one distinguishes a tendency of i^{k^q) 
to decrease with an increase of kq. The red lines in Figs. 
12 are the guides to the eye characterizing the decay. 



B. Centralities 



To measure the importance of a given node with 
respect to different properties of a graph a number 
of so-called centrality measures have been introduced 
mi [m HH ^ 48j. Most of them are based on either 



measuring path lengths to other nodes or on counting 
the number of paths between other nodes mediated by 
this node. The closeness C^{i) [45 and graph C^{i) j46] 
centralities of a node i are based on the shortest path 
lengths £ij to other nodes j: 



1 



max 



(20) 
(21) 



Only nodes j that belong to the same connected compo- 
nent as i contribute to (20), (21). For a given node these 



properties obviously depend on the size of the connected 
component to which the node belongs. The importance 
of the node i with respect to the connectivity within the 
graph may be measured in terms of the number of short- 
est paths cFjkii) between nodes j and k that go via node 
i. Denoting by (jjk the overall number of shortest paths 
between nodes j and k one defines stress C^{i) [47^ and 
betweenness C^{i) [48 centralities by: 



(22) 
(23) 



Numerical values of the betweenness centrality (23) are 
given in Table |l] in IL, P and C-spaces. 

Averaging the two centralities that are based on path 
length (20), ([21]) one obtains values that are closely re- 



lated to the average shortest path length on the GCC. 
As far as this relation is independent of the represen- 
tation of the PTN, we find very similar correspondence 
between (^) and the mean centralities (C^), (C^) in all 
spaces considered as shown in Fig. [T3j The fact that 
these centralities are based on the inverse path length is 
reflected by the negative slope of the curves shown in the 
figures. 



The betweenness centrality ( 23 ) and the related stress 



centrality (22) of a given node measure the share of the 



mean paths between nodes that are mediated by that 
node. It is obvious that a node with a high degree has 
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FIG. 13: Correspondence between the mean shortest path {£) and mean centrahties {C^) (open circles), {C^) (filled circles) for 
all fourteen PTN listed in Table [u] in (a) L, (b) P, and (c) C-spaces. 



a higher probability to be part of any path connecting 
other nodes. This relation between and the node de- 
gree may be quantified by observing their correlation. In 
Figs. 14 we plot the mean betweenness centrality {C^{k)) 
of all nodes that have a given degree k. There, we present 
results for the PTN of Paris in IL, C and P, and IB-spaces. 
Especially well expressed is the betweenness-degree cor- 
relation in P-space (Fig. Il4k) and with somewhat less 
precision in C-space (Fig. |14)3). In both cases there is 
a clear tendency to a power law {C^{k)) ~ /c^ with an 
exponent 77 = 2^3. Let us note here, that this power 
law together with the scale free behavior of the degree 
distribution implies that also the betweenness distribu- 
tion should follow a power law with an exponent S. This 



behavior is clearly identified in Fig 15 for the P-space 
betweenness distribution of the Paris PTN, for which we 
find an exponent S ~ 1.5. The resulting scaling relation 
|49j 



r? = (7 - l)/((5 - 1) 



(24) 



is fulfilled within the accuracy for these exponents. In 
the plots for both IB and P-spaces we observe the occur- 
rence of two regimes which correspond to small and large 
degrees k. This separation however has a different origin 
in each of these cases. In the IB-space representation, the 
network consists of nodes of two types, route nodes and 
station nodes. Typically, station nodes are connected 
only to a low number of routes while there is a minimal 
number of stations per route. One may thus identify the 
low degree behavior as describing the betweenness of sta- 
tion nodes, while the high degree behavior corresponds 
to that of route nodes. In the overlap region of the two 
regimes one may observe that when having the same de- 
gree station nodes have a higher betweenness than route 
nodes. The occurrence of two regimes in the P-space rep- 
resentation has a similar origin as the change of behav- 
ior observed for the mean clustering coefficient (Cp(A:)), 
see Fig|6] Namely, stations with low degrees in general 
belong only to a single route and thus are of low impor- 
tance for the connectivity within the network resulting in 
a low betweenness centrality. Comparing our results with 
those of Ref. [22] we do not however find a saturation for 
the low k region, as observed there. Similar betweenness 




FIG. 14: Mean betweenness centrality {C^{k)) - degree k cor- 
relations for the PTN of Paris in (a) P, (b) C, (c) P, and (d) 
B-spaces. 




FIG. 15: Betweenness centrality - distribution for PTN of 
Paris in P - space. 



{C (k)) - degree relations as observed in Fig. 14 for the 



PTN of Paris we also find for most of the other cities, 
however, with different quality of expression. 
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FIG. 16: Cumulative harness distributions for Istanbul (a) 
and for Moscow (b) PTN. 



FIG. 17: Cumulative harness distributions for Los Angeles, 
a: log-log scale; b: log-linear scale. 



C. Harness 

Besides the local and global properties of networks de- 
scribed above which can be defined in any type of net- 
work, there are some characteristics that are unique for 
PTNs and networks with similar construction principles. 
A particularly striking example is the fact that as far as 
the routes share the same grid of streets and tracks often 
a number of routes will proceed in parallel along shorter 
or longer sequences of stations. Similar phenomena are 
observed in networks built with real space consuming 
links such as cables, pipes, neurons, etc. In the present 
case this behavior may be easily worked out on the basis 
of sequences of stations serviced by each route. To quan- 
tify this behavior recently the notion of network harness 
has been introduced [25 . It is described by the harness 
distribution P(r, s): the number of sequences of s consec- 
utive stations that are serviced by r parallel routes. Sim- 
ilarly to the node-degree distributions, we observe that 
the harness distribution for some cities (Hong Kong, Is- 
tanbul, Paris, Rome, Sao Paolo, Sydney) may be fitted 
by a power law: 



are found in the range of 7^ = 2 ^ 4. For those distribu- 



P(r,s) 



for fixed s, 



(25) 



whereas the PTNs of other cities (Berlin, Dallas, 
Diisseldorf, London, Moscow) are better fitted to an ex- 
ponential decay: 



P(r, s) ~ exp {—r/vs)^ for fixed s. 



(26) 



As examples we show the harness distribution for Istan- 
bul (Fig. [l6^) and for Moscow (Fig. p^Gjj). Moreover, 
sometimes (we observe this for Los Angeles and Taipei), 
for larger s the regime (25) changes to (|26|). We show 



this for the PTN of Los Angeles in Fig. |17[ There, one 
can see that for small values of s the curves are better 



fitted to a power law dependence (25). With increasing 



s a tendency to an exponential decay (26) appears: Fig. 

[nib. 

As one can observe from the Figs. 



16, 17 the slope of 



the harness distribution P(r, s) as a function of the num- 
ber of routes r increases with an increase of the sequence 
length s. For PTNs for which the harness distribution 
follows power law (25) the corresponding exponents 7^ 



tions with an exponential decay the scale fg (26) varies 
in the range = 1.5 ^ 4. The power laws observed for 
the behavior of P(r, s) indicate a certain level of organi- 
zation and planning which may be driven by the need to 
minimize the costs of infrastructure and secondly by the 
fact that points of interest tend to be clustered in certain 
locations of a city. Note that this effect may be seen as 
a result of the strong interdependence of the evolutions 
of both the city and its PTN. 

We want to emphasize that the harness effect is a fea- 
ture of the network given in terms of its routes but it 
is invisible in any of the graph representations presented 
so far. In particular PTN representation in terms of a 
simple graph which do not contain multiple links (such 
as IL, P, C and IB-spaces) can not be used to extract 
harness behavior. Furthermore, the multi-graph repre- 
sentations (such as L', P', and C'-spaces) would need to 
be extended to account for the continuity of routes. As 
noted above, the notion of harness may be useful also for 
the description of other networks with similar properties. 
On the one hand, the harness distribution is closely re- 
lated to distributions of flow and load on the network. 
On the other hand, in the situation of space-consuming 
links (such as tracks, cables, neurons, pipes, vessels) the 
information about the harness behavior may be impor- 
tant with respect to the spatial optimization of networks. 

A generalization may be readily formulated to account 
for real-world networks in which links (such as cables) 
are organized in parallel over a certain spatial distance. 
While for the PTN this distance is simply measured by 
the length of a sequence of stations, a more general mea- 
sure would be the length of the contour along which these 
links proceed in parallel. 



V. GENERALIZED ASSORTATIVITIES 

To describe correlations between the properties of 
neighboring nodes in a network the notion of assorta- 
tivity was introduced measuring the correlation between 
the node degrees of neighboring nodes in terms of the 
mean Pearson correlation coefficient [50l [51]. Here, we 
propose to generalize this concept to also measure corre- 
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lations between the values of other node characteristics 
(other observables) . For any hnk i let Xi and Yi be the 
values of the observable at the two nodes connected by 
this link. Then the correlation coefficient is given by: 

(27) 

where summation is performed with respect to the M 
links of the network. Taking Xi and Yi to be the node 



degrees Eq. (27) is equivalent to the usual formula for the 
assortativity of a network [50] . Here we will call this spe- 
cial case the degree assortativity r^^\ In the following we 
will investigate correlations between other network char- 
acteristics such as the observables considered above, 2:2, 
Q (|§, ([2§, (I2B, ([^ Consequently, 



this results in generalized assortativities of next near- 
est neighbors (r^^^), clustering coefficients (r^^), close- 
ness (r^), graph (r^), stress (r^), and betweenness (r^) 
centralities. 

The numerical values of the above introduced assorta- 
tivities r^^^ and r^^^ for the PTN under discussion are 



listed in Table [IV| in L, P and C-spaces. With respect 
to the values of the standard node degree assortativity 



in IL-space, we find two groups of cities. The first 

is characterized by values = 0.1 ^ 0.3. Although 
these values are still small they signal a finite preference 
for assortative mixing. That is, links tend to connect 
nodes of similar degree. In the second group of cities 
these values are very small r^"^ = —0.02 ^ 0.08 showing 
no preference in linkage between nodes with respect to 
node degrees. PTNs of both large and medium sizes are 
present in each of the groups. This indicates the absence 
of correlations between network size and degree assorta- 
tivity r^'' in IL-space. Measuring the same quantity in 
the P and C-spaces, we observe different behavior. In 
P-space almost all cities are characterized by very small 
(positive or negative) values of r^'^ with the exception 
of the PTNs of Istanbul {/^^ = -0.12) and Los Angeles 
{r^^ = 0.12). On the contrary, in C -space PTNs demon- 
strate clear assortative mixing with r^^ = 0.1 ^ 0.5. An 
exception is the PTN of Paris with r^^ = 0.06. 

As we have seen above, the PTNs demonstrate assorta- 
tive (r^-*^^ > 0) or neutral {A^"^ ~ 0) mixing with respect 
to the node degree (first nearest neighbors number) k. 
Calculating assortativity with respect to the second next 
nearest neighbor number r^^^ we explore the correlation 
of a wider environment of adjacent nodes. Due to the 
fact that in this case the two connected nodes share at 
least part of this environment (the first nearest neighbors 
of a node form part of the second nearest neighbors of 
the adjacent node) one may expect the assortativity r^^^ 
to be non-negative. Results for r^^^ shown in Table IV 
appear to confirm this assumption. In all the spaces con- 
sidered, we find that all PTNs that belong to the group of 
neutral mixing with respect to k also belong to the same 



City 












^(2) 


Berlin 


0.158 


0.616 


0.065 


0.441 


0.086 


0.318 


Dallas 


0.150 


0.712 


0.154 


0.728 


0.290 


0.550 


Diisseldorf 


0.083 


0.650 


0.041 


0.494 


0.244 


0.180 


Hamburg 


0.297 


0.697 


0.087 


0.551 


0.246 


0.605 


Hong Kong 


0.205 


0.632 


-0.067 


0.238 


0.131 


0.087 


Istanbul 


0.176 


0.726 


-0.124 


0.378 


0.282 


0.505 


London 


0.221 


0.589 


0.090 


0.470 


0.395 


0.620 


Los Angeles 


0.240 


0.728 


0.124 


0.500 


0.465 


0.753 


Moscow 


0.002 


0.312 


-0.041 


0.296 


0.208 


0.011 


Paris 


0.064 


0.344 


-0.010 


0.258 


0.060 


-0.008 


Rome 


0.237 


0.719 


0.044 


0.525 


0.384 


0.619 


Sao Paolo 


-0.018 


0.437 


-0.047 


0.266 


0.211 


0.418 


Sydney 


0.154 


0.642 


0.077 


0.608 


0.458 


0.424 


Taipei 


0.270 


0.721 


0.009 


0.328 


0.100 


0.041 



TABLE IV: Nearest neighbors and next nearest neighbors 
assortativities r^^-* and r*^^-* in different spaces for the whole 
PTN. 



group with respect to the second nearest neighbors. For 
those PTNs that display significant nearest neighbors as- 
sortativity r^^^ we find that the second nearest neighbor 
assortativity r^^^ is in general even stronger in line with 
the above reasoning. 

Recall that both closeness and graph centralities 
and are measured in terms of path lengths, Eqs. (20), 



(21 ). It is natural to expect that adjacent nodes will have 
very similar (or almost identical) centralities and C^. 
In turn this will lead to strong assortative mixing with 
high assortativities and r^. This assumption holds 
only if the average path length in the network is suffi- 
ciently large. The latter is certainly the case for PTNs 
in IL-space but it does not hold in P and even less in C- 
spaces. Indeed, in IL-space, where most PTNs display a 
mean path length (ij^) > 10 (see Table [ll[) we find values 
of r£ in the range r£ = 0.904^0.998 (r£ = 0.914^0.999). 
Exceptions are the two PTNs of cities with the smallest 
mean paths. These are Moscow (r£ = 0.865, Vj^ = 0.870) 
with (4) = 7.0 and Paris (r£ = 0.831, = 0.800) with 
(4) = 6.4. 

In P and C-spaces where the mean path lengths are 
much shorter (of the order of three in P and of the or- 
der of two in C-spaces) the one-step difference in path 
length between adjacent nodes leads to much reduced 
assortative mixing. Numerically this is reflected in much 
lower (however positive) values of corresponding assorta- 
tivities for PTNs where {£) is especially small. Indeed, 
for all PTNs that display in P-space a mean path length 
(4) < 2.7 we find r£ < 0.5 (rf> < 0.4). At the same time, 
PTNs with larger (4) may display larger assortativities 
even in P-space. The extreme example is Los Angeles 



with (4) = 4.3 and 



0.914, 



0.844. In C- 



space, where vertices are routes the mean path length is 
even smaller and further reduction of closeness and graph 
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centrality assortativities is observed. For five PTNs we 
find in C-space (^c) < 2 (see Table |ll|) and for these 

< 0.3, < 0.3. Again the largest values are attained 
in the Los Angeles PTN with (^c) = 3.4 and = 0.828, 

= 0.648. 

For the other generalized assortativities (stress and be- 
tweenness centrality assortativities and and cluster- 
ing coefficient assortativity r^^ we in general find no evi- 
dence for any (positive or negative) correlation in any of 
the spaces considered. The only exception are the stress 
and betweenness centrality assortativities in IL-space, r£ 
and r^^. There, small but significantly positive values 
of r£ and r^^ are found. The latter is explained by the 
relatively large mean path length in this space in con- 
junction with relatively small node degree values. Let us 
recall that stress and betweenness centralities essentially 
count the number of shortest paths mediated by a given 
node. If a selected node is a part of many such long 
paths while having low degree, there is high probability 
that any of its neighbors will also be a part of these paths. 
Consequently, a positive value of (r^) will arise. The 
analogous conclusion can be drawn for nodes with low 
betweenness (or stress) centralities. For most PTNs the 
values of the assortativities under consideration change 
in the range r£ = 0.26 ^ 0.64, = 0.20 ^ 0.61. Ex- 
ceptions are the PTNs which in L-space have mean path 
length (ij^) < 10, namely Moscow, Paris and Sao Paolo. 
There we find r£ = 0.02 ^ 0.10, = 0.02 ^ 0.10. 



VI. MODELING PTNS 



the occurrence of the corresponding harness distributions 
described above indicate a preference of routes to service 
common stations (i.e. an attraction between routes). 

Let us describe our model in more detail. As noticed 
above, a route will be modeled as a sequence of stations 
that are adjacent nodes on a two-dimensional square lat- 
tice. Noting that in general loops in PTN routes are al- 
most absent, a most simple choice to model a PTN route 
is a self- avoiding walk (SAW). It may sound less obvious 
that a route apart from being non self-intersecting pro- 
ceeds randomly. However, the analysis of geographical 
data has shown that the fractal dimension of PTN 
routes closely coincides with that of a two-dimensional 
SAW, df = 4/3 [58]. To incorporate all the above fea- 
tures the model is set up as follows. A model PTN con- 
sists of R routes of S stations each constructed on a pos- 
sibly periodic X x X square lattice. The dynamics of the 
route generation adheres to the following rules: 

• 1. Construct the first route as a SAW of S lattice 
sites. 

• 2. Construct the R — 1 subsequent routes as SAWs 
with the following preferential attachment rules: 

a) choose a terminal station at xq with probability 

p^k^^^a/X^; (28) 

b) choose any subsequent station x of the route 
with probability 

p^k^^b. (29) 



A. Motivation and description of the model 

Having at hand the above described wealth of empirical 
data and analysis with respect to typical scenarios found 
in a variety of real-world PTNs we feel in the position 
to propose a model that albeit being simple may capture 
the characteristic features of these networks. Nonetheless 
it should be capable of discriminating between some of 
the various scenarios observed. 

If we were only to reproduce the degree distribution 
of the network, standard models such as random net- 
works |52j or preferential attachment type models 
[31 [SllMl ESI |5l] would suffice. The evolution of such 
networks however is based on the attachment of nodes. 
For description of PTNs the concept of routes as finite 
sequences of stations is essential [SJ [23l |25l [28] and allows 
for the representation with respect to the spaces defined 
above. Moreover, taking a route as the essential element 
of PTN growth allows to account for the essential bipar- 
tite structure of this network [20l [H HI [57] . Therefore, 
the growth dynamics in terms of routes will be a central 
ingredient of our model. Another obvious requirement is 
the embedding of this model in two-dimensional space. 
To simplify matters we will restrict the model to a two- 
dimensional grid, in particular to square lattice. Both the 
observations of power law degree distributions as well as 



In (28), (|29|) k^ is the number of times the lattice site x 



has been visited before (the number of routes that pass 
through x). Note that to ensure the SAW property any 
route that intersects itself is discarded and its construc- 
tion is restarted with step 2a). 



B. Global topology of model PTN 

Let us first investigate the global topology of this 
model as function of its parameters. We first fix both 
the number of routes R and the number of stations S 
per route as well as the size of the lattice X. This leaves 
us with essentially two parameters a and 6, Eqs. (28), 
(29). Dependencies on i?, and X will be studied be- 



low. 

For the real- world PTNs as studied in the previous sec- 
tions, almost all stations belong to a single component, 
GCC, with the possible exception of a very small number 
of routes. Within the network however we often observe 
what above we called the harness effect of several routes 
proceeding in parallel for a sequence of stations. Let us 
first investigate from a global point of view which pa- 
rameters a and b reproduce realistic maps of PTNs. In 



Fig. [TSjwe show simulated PTNs on lattices 300 x 300 for 
R = 1024, 5 = 64 and different values of the parameters 
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FIG. 18: PTN maps of different simulated cities of size 300 x 300 with R = 1024 routes of 5* = 64 stations each (color online). 
First row: a = 0, 6 = 0.1 ^ 0.5. Second row: b = 0.5, a = 15 ^ 500. With an increase of b routes cover more and more area. 
Increase of a leads to clusterisation of the network. 
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TABLE V: Characteristics of the simulated PTN with X — 300, a — for different parameters R, S, and b. The rest of 
notations as in Table [m 



a and b. Each route is represented by a continuous line 
tracing the path along its sequence of stations. For rep- 
resentation purposes, parallel routes are shown slightly 
shifted. Thus, the line thickness and intensity of colors 
indicate the density of the routes. 

Parameter b governs the evolution of each single subse- 
quent route. If 6 = each subsequent route is restricted 
to follow the previous one. The change of simulated 
PTNs with b for fixed a = is shown in the first row 
of Fig. [ill For small values of 6 = ^ 0.1 the PTNs ob- 
tained result in almost all routes following the same path 
with only a few deviations. Increasing b from b = 0.1 to 
b = 0.2 the area covered by the routes increases while 



the majority of the routes are concentrated on a small 
number of paths. Further increasing 6 to 6 = 0.5 and 
beyond we find a wider distributed coverage with the 
central part of the network remaining the most densely 
covered area. This is due to the non-equilibrium growth 
process described by Eqs. (28), (29). 



The parameter a quantifies the possibility to start a 
new route outside the existing network. For vanishing 
a = the resulting network always consists of a single 
connected component, while for finite values of a a few 
or many disconnected components may occur. The re- 
sults found for a = and varying b parameters are com- 
pletely independent of the lattice size X provided X is 
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sufficiently large. When introducing a finite a parame- 
ter, however, new routes may be started anywhere on the 
lattice which results in a strong lattice size dependency. 
To partly compensate for this, the impact of a has been 
normalized by in (28). The dependence of the simu- 
lated PTN maps on a for fixed b = 0.5 is shown in the 
second row of Fig. [18] For a < 15 one observes the for- 
mation of a single large cluster with only a few individual 
routes occurring outside this cluster. Slightly increasing 
a beyond a = 15 one finds a sharp transition to a situa- 
tion with several (two or more) clusters. For much larger 
values of a the number of clusters further increases and 
the situation becomes more and more homogeneous: the 
routes tend to cover all available lattice space area. 



C. Statistical characteristics of model PTN 

From the above qualitative investigation we conclude 
that realistic PTN maps are obtained for small or van- 
ishing a and 6 > 0.5. To quantitatively investigate the 
behavior of the simulated networks on their parameters 
including R and S let us now compare their statistical 
characteristics with those we have empirically obtained 
for the real- world networks. In Table [Vl we have chosen 
to list the same characteristics of the simulated PTN as 
they are displayed for the real-world networks in Table 
im To provide for additional checks of the correlations 
between simulated and real- world networks, we present 
the characteristics in all IL, P, and C-spaces. Let us note 
that our choice of the underlying grid to be a square lat- 
tice limits the number of nearest neighbors of a given 
station in IL-space to kj, < 4. Moreover, as far as no di- 
rect links between these neighbors occur, the clustering 
coefficient in IL-space vanishes, cjl = 0. Nonetheless, as 
we discuss below, both characteristics display nontrivial 
behavior similar to real-world networks when displayed 
in P and C-spaces. 

For reasons explained above we choose a vanishing pa- 
rameter a = and 6 = 0.5 and for comparison b = 5.0. 
The data shown in the Table was obtained for simulated 
PTNs of different numbers of routes, R = 256, 512, 1024 
and route lengths L = 16, 32, 64. In the range of param- 
eters covered in the Table we observe only weak changes 
of the various characteristics. Natural trends are that 
with the increase of the number of routes R the maximal 
and mean shortest path length increases in all spaces. 
Most pronounced this is observed in P-space, while it is 
weakest in C-space. A similar increase is observed in P- 
space when increasing the number of stations S per route. 
Choosing the values of R in the range R = 256^1024 and 
5 = 16, 5 = 32 the average and maximal values of the 
characteristics studied here are found within the ranges 
seen for real- world PTNs, see Table [ill More detailed in- 
formation is contained in the distributions of these char- 
acteristics and their correlations. 

We restrict the further discussion to simulated PTNs 
described by R = 256,512, 1024, S = 16,32, and a = 0, 



b = 0.5, which appear to reproduce many of the charac- 



teristics of real- world PTNs. In figure 19 we display the 
mean shortest path length distribution for these selected 
PTNs in P, P, and C-spaces. In P-space we observe two 
groups of distributions which correspond to the two route 
lengths S = 16 and 6* = 32. The most probable values for 
the path length ^jl being of the order of the corresponding 
S. In P and C-spaces the distributions are very similar 
with most probable path lengths ~ 3, 2. In all 

cases the distributions are well fitted by the asymmet- 
ric unimodal distribution ( 12 ) and resemble those of the 
real- world networks shown in Fig. [Tj Varying b = 0.2^5 
does not significantly change this picture. 

Let us now examine the node degree distributions of 
the simulated PTNs selected above. As explained above, 
the P-space degrees are restricted by the geometry of 
the underlying square lattice. Thus of the representa- 
tions discussed here one may observe non-trivial distri- 



butions only in P, C, and B-spaces. Fig. [2Qfi shows 
the cumulative node degree distribution in P-space in 
semi- logarithmic scale. Recall that for the majority of 
real- world PTNs studied in section Hill as well as in other 
works [22l [27] , the P-space node degree distribution was 
found to decay exponentially. All distributions shown 
in Fig. [20^ display two regions each governed by an 
exponential decay with a separate scale. Note that in- 
creasing both S and R leads to an increase of the ranges 
over which these regions extend. Comparing these results 
with those of Fig. [4j3 for real- world PTNs we find that all 
ranges observed there are also reproduced here. Within 
the parameter ranges chosen here the current model does 
not seem to attain a power law node degree distribution 
in P-space. 

Comparing theC-space node degree distributions for 
real- world and simulated PTNs (Figs. ^ and 20 b, cor- 
respondingly) one finds a definite tendency to an expo- 
nential behavior with two different scales in both cases. 
Note however that for the simulated PTNs the scales in- 
crease with the number of routes R while they decrease 
with the number of stations per route S. 

The simulated results discussed so far concerned data 
obtained for individual instances of modeled PTNs. One 
of the reasons for this was to reduce the computational ef- 
fort required for the calculation of path lengths, between- 
nesses, and related global characteristics. Furthermore, 
in particular for the simulations involving high number 
of routes some self averaging may be expected to occur. 
The latter assumption was tested and verified by (i) sim- 
ulating a reasonable set of PTNs with the same choice of 
parameters and (ii) by performing large-scale simulations 
calculating local characteristics. A result of the latter 
procedure involving averages over up to 3 • 10^ instances 
of simulated networks is shown in Fig. [21^ . There we 
show the node degree distribution of the station nodes 
in IB-space, i.e. the bipartite network of routes and sta- 
tions with the inherent neighborhood relation (see Fig. 
|3|. As can be seen in the double logarithmic plot shown 
in Fig. [21^ a power-law like behavior of this distribu- 
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FIG. 19: Mean shortest path length distribution P{i) for several simulated PTNs. a: L-space; b: P-space; c: C-space. Symbols 
correspond to simulation results, curves to fits of unimodal distributions. 
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FIG. 20: Cumulative node degree distributions P^^^{k) ^ for several simulated PTNs in (a) P and (b) C-spaces. 



tion that extends over a wider range is found for small 
values of the parameter b. This corresponds to a situa- 
tion where one finds many routes to proceed in parallel 
(compare with the maps shown in Fig. 18). For the more 
realistic choices of the b parameter the overall behavior 
of this distribution is described by an exponential decay. 
The scale of this decay strongly depends on b. Fig. [2T| 3 
shows that similar distributions for the real cities have 
oscillating character, which is caused by the fact that 
non-cumulative distributions are plotted. Similarly, in- 
dividual distributions for simulated PTNs are in general 
non- monotonous, however the large number average of 
the distribution appears to be monotonously decreasing. 
Nevertheless, comparing plots in Figs. 21 i and [21)3 one 
sees that in general the model is capable to reproduce 
the global decay properties of the station node degree 
distributions in IB-space. 



In Fig. 22 we show the betweenness-degree correlation 
for the simulated PTN with X = 300, = 500, 5' = 
50, a = 0, and b = 0.5. There, we present the mean 
betweenness centrality {C^{k)) in C, P, and IB-spaces. 
Corresponding plots for a real world network are shown 
in Fig . p!4l Plots displayed for the simulated networks in 
Figs. |22^ - [22fc qualitatively reproduce the behavior of 



{C^{k)) observed for the real world networks in C, P, and 
IB-spaces. P-space behavior can not be reproduced due to 
the restrictions caused by the geometry of the underlying 
square lattice. 

In Figs. [23^ and [23}3 we plot the cumulative harness 
distributions P(r, s) for two simulated networks with R = 
256, 5* = 32, a = and different values of parameter b: 
b = 0.2 (Fig. |23^) and 6 = 1.0 (Fig. [23)b). Similar plots 
for real world networks are given in Figs. [T6]and[T7l The 
plots of Fig. [23] nicely reproduce two regimes empirically 
observed for the real- world PTN. In the first, the harness 



distribution is governed by a power law decay (25), Fig. 



23 i, whereas in the other one there is a tendency to an 
exponential decay (26), Fig. [23)3. A prominent feature 
demonstrated by Fig. 23] is that one can tune the decay 
regime by changing the parameter b. For small values of b 
the probability of a route to proceed in parallel with other 
routes is high c.f. Eq. (29). Therefore, the number of 



"hubs" in the P(r, s) distribution of lines of several routes 
that go in parallel is large for small b. This is reflected 
by a power-law decay of the distribution. Alternatively, 
an increase of b leads to a decrease of such hubs as shown 
by the exponential decay of their distribution. 

Summarizing the comparison of the statistical charac- 
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FIG. 21: a: Averaged (over 3000 ^ 30000 simulated cities) station node degree distributions in B-space. R = 1024, S = 64, 
a = 0. Parameter b changes in the region 6 = 0.1 ^ 1.0 as shown in the legend, b: Corresponding node degree distributions for 
Hamburg (circles). Hong Kong (squares), Los Angeles (triangles), and Istanbul (stars). 




FIG. 22: Mean betweenness centrality {C^{k)) for the simulated city of 300 x 300 sites with R = 500, 5' = 50, a = 0, and b = 0.5 
in C (a), P (b), and B (c) spaces. 




FIG. 23: Cumulative harness distributions P(r, s) for the sim- 
ulated PTN with R = 256, S = 32. Fig. a: a = 0, 6 = 0.2, 
fig. b: a = 0, 6 = 1.0. Compare with plots in Figs. [T6][T7|for 
the real- world networks. 



teristics of real world networks with those of simulated 
ones one can definitely state that the model proposed 
above captures many essential features of real world 
PTNs. This is especially evident if one includes into the 
the comparison different network representations (differ- 
ent spaces) as performed above. 



VII. CONCLUSIONS 

This paper was driven by two main objectives towards 
the analysis of urban public transport networks. First, 
we wanted to present a comprehensive survey of statis- 
tical properties of PTNs based on the data for cities of 
so far unexplored network size (see Table [l|). Based on 
this survey, the second objective was to present a model 
that albeit being simple enough is capable to reproduce 
a majority of these properties. 

Especially helpful in our analysis was the use of differ- 
ent network representations (different spaces, introduced 
in section |llj). Whereas former PTN studies used some of 
these representations, here within a comprehensive ap- 
proach we calculate PTN characteristics as they show up 
in IL, P, C, and B-spaces. It is the comparative analysis 
of empirical data in different spaces that enabled, in par- 
ticular, an adequate PTN modeling presented in section 

ED 

The networks under consideration appear to be 
strongly correlated small- world structures with high val- 
ues of clustering coefficients (especially in IL and less in 
C-spaces) and comparatively low mean shortest path val- 
ues, as listed in Table [ill Standard network characteris- 
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tics listed there correspond to the features a passenger is 
interested in when using pubhc traffic in a given city. To 
give several examples, any two stops in Paris are on the 
average separated by (^l) — 1 = 5.4 stations (with a max- 



imal value of 



1 = 27) and to travel between them 



one on average should do (£p) — 1 = 1.7 changes. Evi- 
dence of correlations present in PTNs are the power-law 
node degree distributions observed fo r ma ny networks in 
IL and for some in P-space (see Table IV). Currently, we 



find no explanation why some of the networks of our sur- 
vey are governed by power-law node degree distributions 
whereas others follow an exponential decay. In the anal- 
ysis of urban street networks a classification has been 
found [39l [59] that allows to discriminate between prop- 
erties of different classes of city organization. Let us note 
however that as a rule the latter analysis is performed for 
restricted regions of street networks i.e. either the histor- 
ical or the suburban part. In the case of a PTN, however, 
one usually deals with a structure that spreads over all 
the city, covering both the inner and outer regions. 

Besides looking on traditional network characteristics 
(as described in sections III - Wl we addressed here a spe- 



cific feature which is unique for PTNs and networks with 
similar construction principles. Namely, we analyzed sta- 
tistical distributions of public transport routes that go in 
parallel for a sequence of stations. As we have shown such 
distributions (we call them harness distributions) are well 
defined for the networks under consideration and may 
be also be used for a quantitative description of similar 
networks embedded in 2D or 3D space as cables, pipes, 
neurons, or (blood-) vessels, etc. 

The common statistical features of the networks con- 
sidered emerge due to their common functional purposes 



and construction principles also reflected in the under- 
lying bipartite structure [57 . It is this structure that 
explains parts of the correlations present in PTNs [ 20] . 
The network growth model we present in Section [Vl] cap- 
tures this structure describing network evolution in terms 
of adding public transport routes, each of them being 
a complete graph in P-space. Our choice to use a self 
avoiding walk (SAW) as a route model in lattice simu- 
lations was motivated by geographical observations and 



other reasons, as argued in section [VT| In support of the 
scaling argument given there, one may note that the frac- 
tal dimension of a SAW on a lattice does not change if 
a weak uncorrelated disorder is present, i.e. when some 
lattice sites can not be visited [60]. In turn, this tells that 
the model is robust with respect to weak disturbances of 
the underlying lattice structure. Further analysis of sim- 



ulated PTNs performed in section [Vlj established strong 
similarities in the statistical characteristics of simulated 
and real- world networks. 

Obviously, the two objectives in the PTN study we 
have so far achieved in this paper - the empirical anal- 
ysis and the modeling - naturally call for an analytic 
approach. In particular, such approach may be used in 
parallel with numerical simulations to derive statistical 



properties of the model proposed in section [VI) This will 
be a task for forthcoming studies. Another natural con- 
tinuation of this work will be to analyze different possibly 
dynamic phenomena that may occur on and with PTNs. 
A particular task will be to study robustness of PTNs to 
targeted attacks and random failures [29\ . 

Yu.H. acknowledges support of the Austrian FWF 
project 19583-PHY. 
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